DSAA 2020

Four tutorials have been accepted by DSAA’2020:

1. Label Noise: Problems and Solutions
2. Computational Approaches to Study Fake News, Misinformation, and Disinformation Diffusion
3. How to Determine the Optimal Anomaly Detection Method For Your Application
4. eXplainable Decision-Support Making

Label Noise: Problems and Solutions

Abstract
With the data coming from various sources and many of them being unreliable, users are presented with the problem of developing robust algorithms or modifying the working of existing algorithms so that they become label noise robust. We would be presenting a study on various aspects of label noise problems and solutions. We would start with the problem setting of standard 0-1 binary classification and outline some historical background. There will be three strands of literature which will be presented: 1) Identification of label noise robust loss functions and corresponding sufficient conditions. We would also discuss about various modifications proposed to surrogate loss functions like unbiasedness, importance re-weighting, etc. These methods are theoretically sound and demonstrate good empirical performance. However, they either assume true noise rates or estimate them, which might not be accurate. The noise considered in this setup is either symmetric across all classes or it is only dependent on class. Also, the setting is empirical risk minimization (ERM) framework. 2) In this part, we generalize the noise setting to instance dependent noise and ERM setup to deep networks. In instance dependent noise case, a major result available is the noise robustness of Isotron algorithm. Another approach taken is via consistency, where k-NN and SVM are shown to be noise robust. With respect to deep networks, various perspectives have been proposed in addition to the existing way of showing inherent robustness of loss function and modification of loss functions. One approach here is to understand the basic working of neural networks in terms of dimensionality and modify the algorithm. Another approach assumes that there is a small amount of clean data available which can be used along with noisy data for training a deep network. 3) In the last part, we will understand the effect of label noise on cases beyond standard classifications like cost sensitive binary classification, generative models, learning from positive and unlabelled data and active learning. For each of these cases, we will briefly explain the problems and the efforts made to solve them in terms of loss functions or the learning approach. Finally, we will conclude by discussing some issues while designing and evaluating the label noise robust algorithms.

Presenter
Tripathi Sandhya

Time
Oct. 8, 2020 08:00 AM Canberra, Melbourne, Sydney

Zoom Meeting ID
https://victoriauniversity.zoom.us/j/94594235363?pwd=SVlRNmFzRUlOUHcxOFVnbTd2SGxtZz09

How to Determine the Optimal Anomaly Detection Method For Your Application

Abstract
An anomaly in a time series is a pattern that does not conform to past patterns of behavior in the series. Anomalies are important to detect as they can indicate events such as pending sensor failures, unexpected environmental conditions, or malicious activity. Unfortunately, there is no one best way to detect all anomalies across a variety of domains; such a methodology is a myth given that time series can display a wide range of behaviors. In addition, what behavior is anomalous can differ from application to application. In this tutorial, we introduce a framework that helps you determine the best anomaly detection method for your application based on the characteristics the time series possesses. For example, some anomaly detection methods will never adapt after a concept drift, predicting every point afterwards to be an anomaly. Some anomaly detection methods require interpolation of missing time steps beforehand while other can handle missing or nonuniform time steps innately. Participants will get hands-on experience applying various anomaly detection methods to several datasets exhibiting different kinds of behaviors. We will then discuss how best to evaluate them (precision, recall, F-score, NAB score, etc.) and choose an appropriate method specific to the time series’ behaviors. We conclude by demonstrating ways to reduce the need for annotation and grid search of anomaly detection parameters by identifying motifs in time series containing confirmed false positives.

Presenter
Cynthia Freeman and Ian Beaver

Time
Oct. 8, 2020 11:00 AM Canberra, Melbourne, Sydney

Webinar ID
913 8389 8636

eXplainable Decision-Support Making

Abstract
The purpose of the XDSM Tutorial is to illustrate state-of-the-art approaches for explain- able data mining and interpretable machine learning, which are the problems, issues and current challenges, and to encourage principled research that will lead to the advancement of explainable, transparent, ethical and fair data mining and machine learning.

Presenter
Riccardo Guidotti, Anna Monreale, Salvo Rinzivillo, Przemyslaw Biecek

Time
Oct. 9, 2020 05:00 PM Canberra, Melbourne, Sydney

Zoom Webinar ID
967 3975 8238

Computational Approaches to Study Fake News, Misinformation, and Disinformation Diffusion

Abstract
After that ‘fake news’ has been declared the official Collins Dictionary 2017 “Word of the Year”, many scientists from different disciplinary fields started to study the topic of understanding how misinformation spreads across social networks and how to fight this potentially dangerous phenomenon for our democracies. Moreover, after the COVID-19 infection has been declared global pandemics, many started to question if the related ‘infodemics’ will amplify substantially the health risks and socio-economic challenges that our world is going to tackle. This amazing effort in the scientific community has however its drawbacks: when scholars approach the problem, they find a considerable amount of papers that have been published in a relatively short period of time, and many different lines of research that apparently spiral out from a number of pre-existing seminal papers that were published in several, and sometimes quite separated, scientific communities. The main purpose of this tutorial is to provide a wide overview of the many existing lines of research that are becoming consistently popular. Although the selection of state of the art results will be necessarily biased and focused on approaches that involve computational methodologies, we will provide many references to pioneeristic and sometimes unexpectedly related works in social and economic science, physics of complex systems, statistical mechanics, and so on. We will mainly focus on social networks based models and identification techniques, giving an introduction to underlying mechanisms that can help to amplify the misinformation spread, or that can act as barriers for information diffusion.

In particular, we will focus on homophily, segregation and polarization phenomena, trying to show how cascades of information can be, somehow counterintuitively, facilitated or also blocked by clustered topologies that are so typical in social networks. First of all, segregation can be a facilitator for the diffusion of misinformation, and debates that take place on social media have also a strong impact on the evolution of the network itself. Moreover, the consolidation of echo-chambers and the emergence of bots are rapidly changing the way we interact with others, and they are forcing our societies to rethink themselves, the way we vote, the freedom of speech, censorship policies, and so on.
Also, we will show how users’ stance in polarized political debates may be tightly connected to the underlying structure of relationships, and how algorithms can be used to provide more efficient tools to test structural balance in signed networks that perfectly describe polarized communities. We will explain why we need a multidisciplinary approach, and scientists need many several computational tools from traditionally different areas of computer science. Complex network analysis, computational linguistics, machine learning provide many methodologies and techniques, and it is not always trivial to use them adequately.

This tutorial will present also some simple epidemiological-like models that can help us to understand the spread of misinformation in segregated networks, in order to show some possible what-if analysis that can be performed to select the best possible debunking strategy, if we have the hypothetical power to convince some individuals in the networks after that we have been able to detect their role in the general social network topology.

Presenter
Giancarlo Ruffo and Alfonso Semeraro

Time
Oct. 9, 2020 08:00 PM Canberra, Melbourne, Sydney

Zoom Webinar ID
967 3975 8238

IEEE DSAA 2020

The 7^th IEEE International Conference on Data Science and Advanced Analytics

6-9 October 2020
Sydney, Australia

IEEE DSAA 2020

The 7^th IEEE International Conference on
Data Science and Advanced Analytics

6-9 October 2020
Sydney, Australia

Tutorials

IEEE DSAA 2020

The 7th IEEE International Conference on Data Science and Advanced Analytics

6-9 October 2020 Sydney, Australia

Tutorials

The 7^th IEEE International Conference on Data Science and Advanced Analytics

6-9 October 2020
Sydney, Australia